NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PLI+: efficient clustering of cloud databases

https://doi.org/10.1007/s10619-018-7252-2

Ton That, Dai Hai; Wagner, James; Rasin, Alexander; Malik, Tanu (October 2018, Distributed and Parallel Databases)

Commercial cloud database services increase availability of data and provide reliable access to data. Routine database maintenance tasks such as clustering, however, increase the costs of hosting data on commercial cloud instances. Clustering causes an I/O burst; clustering in one-shot depletes I/O credit accumulated by an instance and increases the cost of hosting data. An unclustered database decreases query performance by scanning large amounts of data, gradually depleting I/O credits. In this paper, we introduce Physical Location Index Plus (PLI+), an indexing method for databases hosted on commercial cloud. PLI+ relies on internal knowledge of data layout, building a physical location index, which maps a range of physical co-locations with a range of attribute values to create approximately sorted buckets. As new data is inserted, writes are partitioned in memory based on incoming data distribution. The data is written to physical locations on disk in block-based partitions to favor large granularity I/O. Incoming SQL queries on indexed attribute values are rewritten in terms of the physical location ranges. As a result, PLI+ does not decrease query performance on an unclustered cloud database instance, DBAs may choose to cluster the instance when they have sufficiently large I/O credit available for clustering thus delaying the need for clustering. We evaluate query performance over PLI+ by comparing it with clustered, unclustered (secondary) indexes, and log-structured merge trees on real datasets. Experiments show that PLI+ significantly delays clustering, and yet does not degrade query performance—thus achieving higher level of sortedness than unclustered indexes and log-structured merge trees. We also evaluate the quality of clustering by introducing a measure of interval sortedness, and the size of index.
more » « less
Full Text Available
Improving Reproducibility of Distributed Computational Experiments

https://doi.org/10.1145/3214239.3214241

Pham, Quan; Malik, Tanu; That, Dai Hai; Youngdahl, Andrew (January 2018, 1st ACM HPDC Workshop on Practical Reproducible Evaluation of Computer Systems)

Full Text Available
Sciunits: Reusable Research Objects

https://doi.org/10.1109/eScience.2017.51

Ton That, Dai Hai; Fils, Gabriel; Yuan, Zhihao; Malik, Tanu (October 2017, IEEE 13th International Conference on e-Science (e-Science))

Full Text Available
PLI: Augmenting Live Databases with Custom Clustered Indexes

https://doi.org/10.1145/3085504.3085582

Wagner, James; Rasin, Alexander; That, Dai Hai; Malik, Tanu (January 2017, SSDBM '17 Proceedings of the 29th International Conference on Scientific and Statistical Database Management)

RDBMSes only support one clustered index per database table that can speed up query processing. Database applications, that continually ingest large amounts of data, perceive slow query response times to long downtimes, as the clustered index ordering must be strictly maintained. In this paper, we show that application slowdown or downtime, however, can often be avoided if database systems expose the physical location of attributes that are completely or approximately clustered. Towards this, we propose PLI, a physical location index, constructed by determining the physical ordering of an attribute and creating approximately sorted buckets that map physical ordering with attribute values in a live database. To use a PLI incoming SQL queries are simply rewritten with physical ordering information for that particular database. Experiments show queries with the PLI index significantly outperform queries using native unclustered (secondary) indexes, while the index itself requires a much lower maintenance overheads when compared to native clustered indexes.
more » « less
Full Text Available

Search for: All records